Overview

Dataset statistics

Number of variables13
Number of observations3832
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory389.3 KiB
Average record size in memory104.0 B

Variable types

Numeric9
Categorical4

Alerts

city is highly correlated with city_development_indexHigh correlation
city_development_index is highly correlated with cityHigh correlation
relevent_experience is highly correlated with last_new_jobHigh correlation
last_new_job is highly correlated with relevent_experienceHigh correlation
df_index has unique values Unique
major_discipline has 53 (1.4%) zeros Zeros
experience has 107 (2.8%) zeros Zeros
company_size has 371 (9.7%) zeros Zeros
company_type has 137 (3.6%) zeros Zeros
last_new_job has 1699 (44.3%) zeros Zeros

Reproduction

Analysis started2022-02-20 15:11:52.088152
Analysis finished2022-02-20 15:12:09.183580
Duration17.1 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct3832
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9646.476253
Minimum2
Maximum19155
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum2
5-th percentile1044.75
Q14938.25
median9613.5
Q314413.25
95-th percentile18234.45
Maximum19155
Range19153
Interquartile range (IQR)9475

Descriptive statistics

Standard deviation5491.503126
Coefficient of variation (CV)0.5692755553
Kurtosis-1.18122303
Mean9646.476253
Median Absolute Deviation (MAD)4742.5
Skewness-0.006484189091
Sum36965297
Variance30156606.58
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6861
 
< 0.1%
78691
 
< 0.1%
153021
 
< 0.1%
48821
 
< 0.1%
37131
 
< 0.1%
186211
 
< 0.1%
59231
 
< 0.1%
115921
 
< 0.1%
38321
 
< 0.1%
156671
 
< 0.1%
Other values (3822)3822
99.7%
ValueCountFrequency (%)
21
< 0.1%
91
< 0.1%
101
< 0.1%
111
< 0.1%
151
< 0.1%
171
< 0.1%
211
< 0.1%
241
< 0.1%
261
< 0.1%
321
< 0.1%
ValueCountFrequency (%)
191551
< 0.1%
191491
< 0.1%
191351
< 0.1%
191321
< 0.1%
191291
< 0.1%
191281
< 0.1%
191261
< 0.1%
191241
< 0.1%
191231
< 0.1%
191221
< 0.1%

city
Real number (ℝ≥0)

HIGH CORRELATION

Distinct117
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean44.16310021
Minimum0
Maximum122
Zeros8
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile5
Q15
median48
Q364
95-th percentile103
Maximum122
Range122
Interquartile range (IQR)59

Descriptive statistics

Standard deviation35.27668561
Coefficient of variation (CV)0.7987819117
Kurtosis-1.048270934
Mean44.16310021
Median Absolute Deviation (MAD)35
Skewness0.3802565906
Sum169233
Variance1244.444548
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5862
22.5%
64574
15.0%
48285
 
7.4%
13274
 
7.2%
49166
 
4.3%
30127
 
3.3%
9584
 
2.2%
461
 
1.6%
658
 
1.5%
9958
 
1.5%
Other values (107)1283
33.5%
ValueCountFrequency (%)
08
 
0.2%
117
 
0.4%
255
 
1.4%
319
 
0.5%
461
 
1.6%
5862
22.5%
658
 
1.5%
714
 
0.4%
83
 
0.1%
91
 
< 0.1%
ValueCountFrequency (%)
12215
 
0.4%
1219
 
0.2%
12016
 
0.4%
1194
 
0.1%
1187
 
0.2%
1179
 
0.2%
11644
1.1%
1151
 
< 0.1%
11411
 
0.3%
1134
 
0.1%

city_development_index
Real number (ℝ≥0)

HIGH CORRELATION

Distinct88
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean62.11482255
Minimum0
Maximum92
Zeros2
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile14
Q127
median80
Q385
95-th percentile90
Maximum92
Range92
Interquartile range (IQR)58

Descriptive statistics

Standard deviation29.56445109
Coefficient of variation (CV)0.4759645101
Kurtosis-1.126743255
Mean62.11482255
Median Absolute Deviation (MAD)9
Skewness-0.760914858
Sum238024
Variance874.056768
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
851028
26.8%
14574
15.0%
82285
 
7.4%
90274
 
7.2%
27137
 
3.6%
78127
 
3.3%
9186
 
2.2%
6784
 
2.2%
5761
 
1.6%
8858
 
1.5%
Other values (78)1118
29.2%
ValueCountFrequency (%)
02
 
0.1%
14
 
0.1%
32
 
0.1%
41
 
< 0.1%
51
 
< 0.1%
61
 
< 0.1%
717
 
0.4%
852
1.4%
916
 
0.4%
104
 
0.1%
ValueCountFrequency (%)
929
 
0.2%
9186
 
2.2%
90274
 
7.2%
8927
 
0.7%
8858
 
1.5%
8734
 
0.9%
861
 
< 0.1%
851028
26.8%
8415
 
0.4%
8342
 
1.1%

gender
Categorical

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size224.7 KiB
1.0
3419 
0.0
372 
2.0
 
41

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row0.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.03419
89.2%
0.0372
 
9.7%
2.041
 
1.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
1.03419
89.2%
0.0372
 
9.7%
2.041
 
1.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

relevent_experience
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size217.2 KiB
0
2754 
1
1078 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row1
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
02754
71.9%
11078
 
28.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
02754
71.9%
11078
 
28.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Distinct3
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size224.7 KiB
2.0
2795 
0.0
796 
1.0
 
241

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2.0
2nd row0.0
3rd row2.0
4th row2.0
5th row2.0

Common Values

ValueCountFrequency (%)
2.02795
72.9%
0.0796
 
20.8%
1.0241
 
6.3%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2.02795
72.9%
0.0796
 
20.8%
1.0241
 
6.3%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

education_level
Categorical

Distinct5
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size224.7 KiB
0.0
2385 
2.0
916 
1.0
401 
3.0
 
66
4.0
 
64

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.02385
62.2%
2.0916
 
23.9%
1.0401
 
10.5%
3.066
 
1.7%
4.064
 
1.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
0.02385
62.2%
2.0916
 
23.9%
1.0401
 
10.5%
3.066
 
1.7%
4.064
 
1.7%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

major_discipline
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.717379958
Minimum0
Maximum5
Zeros53
Zeros (%)1.4%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile2
Q15
median5
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.9540366772
Coefficient of variation (CV)0.2022386761
Kurtosis11.48611069
Mean4.717379958
Median Absolute Deviation (MAD)0
Skewness-3.505073471
Sum18077
Variance0.9101859814
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
53463
90.4%
2140
 
3.7%
476
 
2.0%
161
 
1.6%
053
 
1.4%
339
 
1.0%
ValueCountFrequency (%)
053
 
1.4%
161
 
1.6%
2140
 
3.7%
339
 
1.0%
476
 
2.0%
53463
90.4%
ValueCountFrequency (%)
53463
90.4%
476
 
2.0%
339
 
1.0%
2140
 
3.7%
161
 
1.6%
053
 
1.4%

experience
Real number (ℝ≥0)

ZEROS

Distinct22
Distinct (%)0.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean12.84916493
Minimum0
Maximum21
Zeros107
Zeros (%)2.8%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile1
Q17
median14
Q318
95-th percentile21
Maximum21
Range21
Interquartile range (IQR)11

Descriptive statistics

Standard deviation6.579093656
Coefficient of variation (CV)0.5120249988
Kurtosis-0.9627078369
Mean12.84916493
Median Absolute Deviation (MAD)5
Skewness-0.4988774109
Sum49238
Variance43.28447333
MonotonicityNot monotonic
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
21623
16.3%
15302
 
7.9%
14290
 
7.6%
13260
 
6.8%
16239
 
6.2%
11227
 
5.9%
1201
 
5.2%
19199
 
5.2%
17187
 
4.9%
18185
 
4.8%
Other values (12)1119
29.2%
ValueCountFrequency (%)
0107
2.8%
1201
5.2%
2130
3.4%
3101
2.6%
487
2.3%
5101
2.6%
6148
3.9%
7119
3.1%
872
 
1.9%
950
 
1.3%
ValueCountFrequency (%)
21623
16.3%
20113
 
2.9%
19199
 
5.2%
18185
 
4.8%
17187
 
4.9%
16239
 
6.2%
15302
7.9%
14290
7.6%
13260
6.8%
1220
 
0.5%

company_size
Real number (ℝ≥0)

ZEROS

Distinct8
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.158663883
Minimum0
Maximum7
Zeros371
Zeros (%)9.7%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median3
Q34
95-th percentile7
Maximum7
Range7
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.028479838
Coefficient of variation (CV)0.6421955335
Kurtosis-0.7877589264
Mean3.158663883
Median Absolute Deviation (MAD)1
Skewness0.2345289145
Sum12104
Variance4.114730451
MonotonicityNot monotonic
Histogram with fixed size bins (bins=8)
ValueCountFrequency (%)
4908
23.7%
1656
17.1%
3597
15.6%
2458
12.0%
0371
9.7%
7334
 
8.7%
5277
 
7.2%
6231
 
6.0%
ValueCountFrequency (%)
0371
9.7%
1656
17.1%
2458
12.0%
3597
15.6%
4908
23.7%
5277
 
7.2%
6231
 
6.0%
7334
 
8.7%
ValueCountFrequency (%)
7334
 
8.7%
6231
 
6.0%
5277
 
7.2%
4908
23.7%
3597
15.6%
2458
12.0%
1656
17.1%
0371
9.7%

company_type
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.393267223
Minimum0
Maximum5
Zeros137
Zeros (%)3.6%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile1
Q15
median5
Q35
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.336821295
Coefficient of variation (CV)0.304288637
Kurtosis3.734393113
Mean4.393267223
Median Absolute Deviation (MAD)0
Skewness-2.251681479
Sum16835
Variance1.787091176
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
52925
76.3%
4423
 
11.0%
1203
 
5.3%
0137
 
3.6%
2117
 
3.1%
327
 
0.7%
ValueCountFrequency (%)
0137
 
3.6%
1203
 
5.3%
2117
 
3.1%
327
 
0.7%
4423
 
11.0%
52925
76.3%
ValueCountFrequency (%)
52925
76.3%
4423
 
11.0%
327
 
0.7%
2117
 
3.1%
1203
 
5.3%
0137
 
3.6%

last_new_job
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.740344468
Minimum0
Maximum5
Zeros1699
Zeros (%)44.3%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q34
95-th percentile5
Maximum5
Range5
Interquartile range (IQR)4

Descriptive statistics

Standard deviation1.934765045
Coefficient of variation (CV)1.111713848
Kurtosis-1.353111525
Mean1.740344468
Median Absolute Deviation (MAD)1
Skewness0.5656052695
Sum6669
Variance3.743315778
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
01699
44.3%
4662
 
17.3%
1580
 
15.1%
5486
 
12.7%
2204
 
5.3%
3201
 
5.2%
ValueCountFrequency (%)
01699
44.3%
1580
 
15.1%
2204
 
5.3%
3201
 
5.2%
4662
 
17.3%
5486
 
12.7%
ValueCountFrequency (%)
5486
 
12.7%
4662
 
17.3%
3201
 
5.2%
2204
 
5.3%
1580
 
15.1%
01699
44.3%

training_hours
Real number (ℝ≥0)

Distinct232
Distinct (%)6.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean61.63726514
Minimum0
Maximum240
Zeros5
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size30.1 KiB

Quantile statistics

Minimum0
5-th percentile7
Q122
median46
Q387
95-th percentile171
Maximum240
Range240
Interquartile range (IQR)65

Descriptive statistics

Standard deviation51.64722767
Coefficient of variation (CV)0.8379221167
Kurtosis1.147776109
Mean61.63726514
Median Absolute Deviation (MAD)29
Skewness1.263712933
Sum236194
Variance2667.436126
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1666
 
1.7%
1164
 
1.7%
2763
 
1.6%
4162
 
1.6%
2162
 
1.6%
1960
 
1.6%
2358
 
1.5%
4955
 
1.4%
2553
 
1.4%
1052
 
1.4%
Other values (222)3237
84.5%
ValueCountFrequency (%)
05
 
0.1%
114
 
0.4%
218
 
0.5%
337
1.0%
422
0.6%
545
1.2%
648
1.3%
743
1.1%
850
1.3%
948
1.3%
ValueCountFrequency (%)
2404
0.1%
2382
 
0.1%
2372
 
0.1%
2361
 
< 0.1%
2355
0.1%
2332
 
0.1%
2323
0.1%
2314
0.1%
2304
0.1%
2294
0.1%

Interactions

Correlations

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcitycity_development_indexgenderrelevent_experienceenrolled_universityeducation_levelmajor_disciplineexperiencecompany_sizecompany_typelast_new_jobtraining_hours
0686103911.002.00.05.03.02.05.00.041
11106116270.010.00.05.011.05.05.05.017
28966103911.002.01.05.015.07.05.00.021
3796138631.002.00.05.01.02.04.00.077
4518264141.002.00.05.021.07.05.04.077
51420779120.010.00.05.014.05.05.01.0144
6147835851.002.00.05.014.01.05.00.033
782149851.002.04.05.019.07.02.01.0153
857625851.011.01.05.011.05.05.00.046
91625013901.002.01.05.021.01.05.04.015

Last rows

df_indexcitycity_development_indexgenderrelevent_experienceenrolled_universityeducation_levelmajor_disciplineexperiencecompany_sizecompany_typelast_new_jobtraining_hours
3822187592731.001.00.05.018.04.05.00.027
3823168815851.002.00.05.021.03.02.01.0197
38241164517291.001.02.05.01.02.05.02.018
38251007264141.002.00.05.017.02.05.00.062
3826110865851.002.00.03.021.02.05.04.043
382790555851.012.00.05.013.02.05.00.0171
38281120793211.000.00.05.021.03.05.00.07
3829167905851.012.00.05.06.05.04.02.0217
3830557570911.010.01.05.011.03.05.05.0180
38311038713901.002.02.05.021.01.05.04.071